Improving Translation Fluency with Search-Based Decoding and a Monolingual Statistical Machine Translation Model for Automatic Post-Editing

نویسندگان

  • Jing-Shin Chang
  • Sheng-Sian Lin
چکیده

The BLEU scores and translation fluency for the current state-of-the-art SMT systems based on IBM models are still too low for publication purposes. The major issue is that stochastically generated sentences hypotheses, produced through a stack decoding process, may not strictly follow the natural target language grammar, since the decoding process is directed by a highly simplified translation model and n-gram language model, and a large number of noisy phrase pairs may introduce significant search errors. This paper proposes a statistical post-editing (SPE) model, based on a special monolingual SMT paradigm, to “translate”disfluent sentences into fluent sentences. However, instead of conducting a stack decoding process, the sentence hypotheses are searched from fluent target sentences in a large target language corpus or on the Web to ensure fluency. Phrase-based local editing, if necessary, is then applied to correct weakest phrase alignments between the disfluent and searched hypotheses using fluent target language phrases; such phrases are segmented from a large target language corpus with a global optimization criterion to maximize the likelihood of the training sentences, instead of using noisy phrases combined from bilingually wordaligned pairs. With such search-based decoding, the absolute BLEU scores are much higher than automatic post editing systems that conduct a classical SMT decoding process. We are also able to fully correct a significant number of disfluent sentences into completely fluent versions. The BLEU scores are significantly improved. The evaluation shows that on average 46% of translation errors can be fully recovered, and the BLEU score can be improved by about 26%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fluency Constraints for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices

A novel and robust approach to improving statistical machine translation fluency is developed within a minimum Bayesrisk decoding framework. By segmenting translation lattices according to confidence measures over the maximum likelihood translation hypothesis we are able to focus on regions with potential translation errors. Hypothesis space constraints based on monolingual coverage are applied...

متن کامل

USAAR-SAPE: An English-Spanish Statistical Automatic Post-Editing System

We describe the USAAR-SAPE English– Spanish Automatic Post-Editing (APE) system submitted to the APE Task organized in the Workshop on Statistical Machine Translation (WMT) in 2015. Our system was able to improve upon the baseline MT system output by incorporating Phrase-Based Statistical MT (PBSMT) technique into the monolingual Statistical APE task (SAPE). The reported final submission crucia...

متن کامل

Community-based post-editing of machine-translated content: monolingual vs. bilingual

We carried out a machine-translation postediting pilot study with users of an IT support forum community. For both language pairs (English to German, English to French), 4 native speakers for each language were recruited. They performed monolingual and bilingual postediting tasks on machine-translated forum content. The post-edited content was evaluated using human evaluation (fluency, comprehe...

متن کامل

Lattice rescoring methods for statistical machine translation

Modern statistical machine translation (SMT) systems include multiple interrelated components, statistical models, and processes. Translation is often factored as a cascaded series of modules such that the output of one module serves as the input to the next; this is the SMT pipeline. Simplifying assumptions, limited training data, and pruning during search mean that the hypothesis produced by ...

متن کامل

Applying Statistical Post-Editing to English-to-Korean Rule-based Machine Translation System

Conventional rule-based machine translation system suffers from its weakness of fluency in the view of target language generation. In particular, when translating English spoken language to Korean, the fluency of translation result is as important as adequacy in the aspect of readability and understanding. This problem is more severe in language pairs such as English-Korean. It’s because Englis...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009